Skip to content

Conversation

@jeckersb
Copy link
Collaborator

@jeckersb jeckersb commented Aug 1, 2025

This adds a new kernel::Cmdline struct, which is populated either
via Cmdline::from (borrowed) or Cmdline::from_proc (owned).

This attempts to follow the same behavior as the kernel, which is
mostly covered in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227

The algorithm is basically:

  • Scan forward until you find the first unquoted isspace()
    byte. That's the end of the param.
  • If you encounter an = along the way, note where. That's where it
    will terminate the key and split for the value. Any future = are
    not treated as special.
  • The value can be quoted to allow spaces, but is unquoted only in as
    much as " is removed from the first or last byte. You can still
    have " in the middle of the value.

This operates on &[u8] because the kernel does not enforce any
particular encoding for the cmdline. Iterating using
Cmdline::iter() will emit the Parameter type, which has helper
methods key_lossy() and value_lossy() to convert
potentially-non-UTF8 data into Strings.

Resolves: #1425
Signed-off-by: John Eckersberg jeckersb@redhat.com

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new, more robust kernel command line parser, kernel::Cmdline, which correctly handles quoted values. The new parser is used throughout the codebase, replacing the previous simpler implementation. The changes are well-tested and improve correctness. I've provided a few suggestions in kernel.rs to improve code conciseness and use more idiomatic Rust patterns.

@jeckersb jeckersb force-pushed the kernel-cmdline-v2 branch from e021bb8 to f56e9d4 Compare August 1, 2025 17:00
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane just some style nits. Thanks for working on this!

impl<'a, T: AsRef<[u8]> + ?Sized> From<&'a T> for Parameter<'a> {
fn from(input: &'a T) -> Self {
let input = input.as_ref();
let equals = input.iter().position(|b| *b == EQUALS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a lot cleaner to use split_once for this. There are several other examples in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_once is nightly-only for slices though ☹️

https://doc.rust-lang.org/std/primitive.slice.html#method.split_once

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, annoying.

You know I think the bigger picture thing here is that almost everyone doing parsing of u8 is using a higher level library and so there's not much motivation to improve std here. There's a lot of those.

One most relevant for this use case is BStr which has https://docs.rs/bstr/latest/bstr/trait.ByteSlice.html#method.split_once_str

bstr is a pretty good fit for this, it's not in our depchain today but that's just because we don't use regex among other things.

OK as is but perhaps worth a comment like

// https://doc.rust-lang.org/std/primitive.slice.html#method.split_once is nightly only right now
// We might consider using bstr in the future too

or so?

@cgwalters
Copy link
Collaborator

Not a blocker but something to discuss I think raised by this that we will want to address is that we still have overlap/duplication with at least the kargs parsing in composefs-boot. So one option is to move this there (and also make it pub). The other possibility (which I keep thinking more about) is a "friendly" fork of composefs-boot that lives in this repo; I think we may end up needing that really for things like #1469 among others.

@jeckersb jeckersb force-pushed the kernel-cmdline-v2 branch from f56e9d4 to d650114 Compare August 1, 2025 17:43
This adds a new `kernel::Cmdline` struct, which is populated either
via `Cmdline::from` (borrowed) or `Cmdline::from_proc` (owned).

This attempts to follow the same behavior as the kernel, which is
mostly covered in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227

The algorithm is basically:
- Scan forward until you find the first unquoted isspace()
  byte. That's the end of the param.
- If you encounter an `=` along the way, note where. That's where it
  will terminate the key and split for the value. Any future `=` are
  not treated as special.
- The value can be quoted to allow spaces, but is unquoted only in as
  much as `"` is removed from the first or last byte. You can still
  have `"` in the middle of the value.

This operates on `&[u8]` because the kernel does not enforce any
particular encoding for the cmdline.  Iterating using
`Cmdline::iter()` will emit the `Parameter` type, which has helper
methods `key_lossy()` and `value_lossy()` to convert
potentially-non-UTF8 data into `String`s.

Resolves: bootc-dev#1425
Signed-off-by: John Eckersberg <jeckersb@redhat.com>
@jeckersb jeckersb force-pushed the kernel-cmdline-v2 branch from d650114 to cf7f150 Compare August 1, 2025 17:43
@cgwalters cgwalters merged commit 1532e74 into bootc-dev:main Aug 1, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up kernel argument parsing

2 participants